Statistical inference from a single sample of data
Statistical inference from multiple samples of data
Sampling distributions of sample estimates are (often) determined using the Central Limit Theorem
Standard error as an estimate for the standard deviation of a sampling distribution
Margin of error
Critical values corresponding to particular confidence levels
Confidence intervals provide an (interval) estimate for the effect of interest. Hence, confidence intervals, and not hypothesis tests, can inform us about the effect size. This makes it easier for us to compare the results of our statistical analysis to a practical understanding of what kind of “effect” would be important in a particular problem setting.
Practical significance is not determined by statistical significance! Statistical significance is not determined by practical significance!
Hypotheses are typically designed so that what we want to prove is expressed in the alternative. For all of the methods that we’ve covered thus far, the null hypothesis is always going to be of the form \[H_0: \text{<parameter> } = \text{ some number}\]
The only way to reduce both types of error is to collect more evidence or, in statistical terms, to collect more data.
\(\alpha = Pr(\text{Type I error})\): If \(H_0\) is true, this is the probability that we (incorrectly) reject it.
\(\beta = Pr(\text{Type II error})\): If \(H_0\) is false, this is the probability that we (incorrectly) fail to reject it.
\(1-\beta = Power\) If \(H_0\) is false, this is the probability that we (correctly) reject it.
The logic of hypothesis tests is similar to the logic behind inter-universe travel in the movie Everything Everywhere All at Once…
Two independent groups
Two paired groups of data
On average, how much more money do consumers spend at Target compared to Walmart?
Suppose researchers collected a systematic sample from \(85\) Walmart customers and \(80\) Target customers by asking them for their purchase amount as they left the stores. The data they collected is summarized in the table below. Suppose a computer already calculated the degrees of freedom to be \(162.75\).
| Walmart | Target | |
|---|---|---|
| \(\bar{x}\) | \(\$45\) | \(\$53\) |
| s | \(\$21\) | \(\$19\) |
Step 1) Identify and define the population parameter and choose your confidence level.
Step 2) Calculate the sample estimate for the population parameter.
Step 3) Assess the required assumptions and conditions.
Step 4) Find the critical value corresponding to your confidence level.
Step 5) Calculate the standard error of your sample estimate.
Step 6) Calculate the lower and upper bounds of your confidence interval.
On average, how large is the difference in car insurance prices for customers of an online insurance company versus customers of a local insurance company?
Find a \(95\%\) confidence interval for the mean difference in insurance prices based on the data given below.
mean(insurance_diff$PriceDiff)
## [1] 45.9
sd(insurance_diff$PriceDiff)
## [1] 175.6628
Week 14 - new statistical method (our last one for the semester)
Week 14 and 15 - begin discussions on ethical statistical practice
Week 15 - Friday in-class poster presentation
You and your group mates are welcome to attend anytime between 9:30am-10:30am or 11:00am-12:30pm.
Plan to spend at least 45 minutes in class and come early to hang up your poster in the room. (Prof Suzy will provide hanging supplies.)
Prof Suzy will take turns meeting with each group for 5-7 minutes where you will present your topic.
All participants will be asked to take some time and read the other posters. Each person will need to submit a 3-4 sentence summary ion another group’s project in order to get credit for attendance this day.
Step 1) \(\mu_1 - \mu_2 =\) mean amount spent at Target minus mean amount spent at Walmart. We’ll use a 95% confidence level.
Step 2) \(\bar{x}_1 - \bar{x}_2 = 8\)
Step 3) Assess the required assumptions and conditions - done in class.
Step 4) We need the critical \(t^*\) value corresponding to a 0.95 confidence level from a Student’s t distribution with \(162.75\) degrees of freedom. We can find this exactly using R and this value should be similar to the approximate critical value which you can read off the t-table.
qt(0.025, df = 162.75, lower.tail=TRUE)
## [1] -1.974647
Step 5) \(SE(\bar{x}_1 - \bar{x}_2) = \sqrt(\frac{19^2}{80} + \frac{21^2}{85}) = 3.115\)
Step 6) $ 8 (1.975 ) = [$1.848, $14.152]$ with interpretation given in class.
Step 1) Identify and define the population parameter and choose your confidence level.
Step 2) Calculate the sample estimate for the population parameter.
Step 3) Assess the required assumptions and conditions.
Step 4) Find the critical value corresponding to your confidence level.
Step 5) Calculate the standard error of your sample estimate.
Step 6) Calculate the lower and upper bounds of your confidence interval.